Skip to content

Conversation

@midhuncodes7
Copy link
Contributor

Implements (SECTION_TYPE)(+offset) syntax for XCOFF in llvm-symbolizer to support section-relative addressing on AIX. This enables sanitizers to symbolize addresses without knowing section names by using standardized section types (TEXT, DATA, BSS, etc.).

The implementation parses the new syntax, maps section types to XCOFF STYP flags, and converts section-relative offsets to absolute VMAs using the formula: VMA = section_base + offset.

Implementation detail:
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:

  • Parse (SECTION_TYPE)(+offset) syntax for XCOFF (e.g., (TEXT)(+0x100))
  • Map section types (TEXT, DATA, BSS, TDATA, TBSS, etc.) to XCOFF STYP flags
  • Convert section-relative offsets to absolute VMAs: VMA = section_base + offset
  • Validate syntax with helpful error messages

Tests:

  • xcoff-section-relative.ll: Functional test with actual symbol addresses
  • xcoff-section-syntax.test: Syntax validation and error handling

@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Midhunesh (midhuncodes7)

Changes

Implements (SECTION_TYPE)(+offset) syntax for XCOFF in llvm-symbolizer to support section-relative addressing on AIX. This enables sanitizers to symbolize addresses without knowing section names by using standardized section types (TEXT, DATA, BSS, etc.).

The implementation parses the new syntax, maps section types to XCOFF STYP flags, and converts section-relative offsets to absolute VMAs using the formula: VMA = section_base + offset.

Implementation detail:
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:

  • Parse (SECTION_TYPE)(+offset) syntax for XCOFF (e.g., (TEXT)(+0x100))
  • Map section types (TEXT, DATA, BSS, TDATA, TBSS, etc.) to XCOFF STYP flags
  • Convert section-relative offsets to absolute VMAs: VMA = section_base + offset
  • Validate syntax with helpful error messages

Tests:

  • xcoff-section-relative.ll: Functional test with actual symbol addresses
  • xcoff-section-syntax.test: Syntax validation and error handling

Full diff: https://github.com/llvm/llvm-project/pull/168524.diff

3 Files Affected:

  • (added) llvm/test/tools/llvm-symbolizer/xcoff-section-relative.ll (+51)
  • (added) llvm/test/tools/llvm-symbolizer/xcoff-section-syntax.test (+31)
  • (modified) llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp (+151-8)
diff --git a/llvm/test/tools/llvm-symbolizer/xcoff-section-relative.ll b/llvm/test/tools/llvm-symbolizer/xcoff-section-relative.ll
new file mode 100644
index 0000000000000..d1e21fe135e9e
--- /dev/null
+++ b/llvm/test/tools/llvm-symbolizer/xcoff-section-relative.ll
@@ -0,0 +1,51 @@
+;; Test section-relative address syntax for XCOFF
+;; The syntax (SECTION_TYPE)(+offset) represents: offset from section base
+
+; REQUIRES: system-aix
+; RUN: llc -filetype=obj -o %t -mtriple=powerpc-aix-ibm-xcoff -function-sections < %s
+
+;; Test 1: Symbolize .foo using section-relative offset
+; RUN: llvm-nm --numeric-sort %t | grep " T \.foo$" | awk '{printf "CODE (TEXT)(+0x%%s)", $1}' > %t.foo_query
+; RUN: llvm-symbolizer --obj=%t @%t.foo_query | FileCheck %s --check-prefix=TEST-FOO
+
+;; Test 2: Symbolize .bar using section-relative offset
+; RUN: llvm-nm --numeric-sort %t | grep " T \.bar$" | awk '{printf "CODE (TEXT)(+0x%%s)", $1}' > %t.bar_query
+; RUN: llvm-symbolizer --obj=%t @%t.bar_query | FileCheck %s --check-prefix=TEST-BAR
+
+;; Test 3: Symbolize global_var using section-relative offset in DATA section
+; RUN: llvm-readobj --sections %t | awk '/Name: \.data/{found=1} found && /VirtualAddress:/{print $2; exit}' > %t.data_base
+; RUN: llvm-nm --numeric-sort %t | grep " D global_var$" | awk '{print $1}' > %t.global_var_vma
+; RUN: sh -c 'printf "%%d\n" $(cat %t.data_base)' > %t.data_base_dec
+; RUN: sh -c 'printf "%%d\n" 0x$(cat %t.global_var_vma)' > %t.global_var_dec
+; RUN: awk 'NR==FNR{base=$1; next} {vma=$1; printf "DATA (DATA)(+0x%%x)", vma-base}' %t.data_base_dec %t.global_var_dec > %t.data_query
+; RUN: llvm-symbolizer --obj=%t @%t.data_query | FileCheck %s --check-prefix=TEST-DATA
+
+;; Test 4: Verify section structure with llvm-readobj
+; RUN: llvm-readobj --sections %t | FileCheck %s --check-prefix=SECTIONS
+
+define void @foo() {
+entry:
+  ret void
+}
+
+define void @bar() {
+entry:
+  ret void
+}
+
+@global_var = global i32 42, align 4
+
+;; Verify correct symbolization with section-relative syntax
+; TEST-FOO: .foo
+; TEST-FOO-NEXT: ??:0:0
+
+; TEST-BAR: .bar
+; TEST-BAR-NEXT: ??:0:0
+
+; TEST-DATA: global_var
+
+;; Verify XCOFF sections exist with correct types
+; SECTIONS: Name: .text
+; SECTIONS: Type: STYP_TEXT
+; SECTIONS: Name: .data
+; SECTIONS: Type: STYP_DATA
diff --git a/llvm/test/tools/llvm-symbolizer/xcoff-section-syntax.test b/llvm/test/tools/llvm-symbolizer/xcoff-section-syntax.test
new file mode 100644
index 0000000000000..ca5ef9d3cb2cc
--- /dev/null
+++ b/llvm/test/tools/llvm-symbolizer/xcoff-section-syntax.test
@@ -0,0 +1,31 @@
+## Test section-relative address syntax parsing for XCOFF
+## This tests that the (SECTION_TYPE)(+offset) syntax produces appropriate
+## error messages for invalid syntax
+
+# REQUIRES: system-aix
+
+## Create a simple XCOFF object for testing
+# RUN: echo "define void @test() { ret void }" | \
+# RUN:   llc -filetype=obj -mtriple=powerpc-aix-ibm-xcoff -o %t.o
+
+## Test invalid section type
+# RUN: llvm-symbolizer --obj=%t.o '(INVALID)(+0x10)' 2>&1 | \
+# RUN:   FileCheck %s --check-prefix=INVALID-TYPE
+
+## Test missing '+' sign
+# RUN: llvm-symbolizer --obj=%t.o '(TEXT)(0x10)' 2>&1 | \
+# RUN:   FileCheck %s --check-prefix=NO-PLUS
+
+## Test invalid offset value (not a hex number)
+# RUN: llvm-symbolizer --obj=%t.o '(TEXT)(+abc)' 2>&1 | \
+# RUN:   FileCheck %s --check-prefix=INVALID-OFFSET
+
+## Test empty section type
+# RUN: llvm-symbolizer --obj=%t.o '()(+0x10)' 2>&1 | \
+# RUN:   FileCheck %s --check-prefix=EMPTY-SECTION
+
+## Verify error messages are helpful
+# INVALID-TYPE: unknown section type
+# NO-PLUS: section-relative offset must start with '+'
+# INVALID-OFFSET: invalid offset in section-relative address
+# EMPTY-SECTION: unknown section type
diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
index 4784dafeb2948..3bdbce55c4f68 100644
--- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
@@ -17,15 +17,19 @@
 #include "Opts.inc"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
+#include "llvm/BinaryFormat/XCOFF.h"
 #include "llvm/Config/config.h"
 #include "llvm/DebugInfo/Symbolize/DIPrinter.h"
 #include "llvm/DebugInfo/Symbolize/Markup.h"
 #include "llvm/DebugInfo/Symbolize/MarkupFilter.h"
 #include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"
 #include "llvm/DebugInfo/Symbolize/Symbolize.h"
+#include "llvm/Object/XCOFFObjectFile.h"
 #include "llvm/Debuginfod/BuildIDFetcher.h"
 #include "llvm/Debuginfod/Debuginfod.h"
 #include "llvm/Debuginfod/HTTPClient.h"
+#include "llvm/Object/XCOFFObjectFile.h"
 #include "llvm/Option/Arg.h"
 #include "llvm/Option/ArgList.h"
 #include "llvm/Option/Option.h"
@@ -157,11 +161,97 @@ static Error makeStringError(StringRef Msg) {
   return make_error<StringError>(Msg, inconvertibleErrorCode());
 }
 
+// Helper function to get XCOFF section type flag from string
+ static std::optional<XCOFF::SectionTypeFlags> parseXCOFFSectionType(StringRef TypeStr) {
+   return StringSwitch<std::optional<XCOFF::SectionTypeFlags>>(TypeStr)
+       .Case("PAD", XCOFF::STYP_PAD)
+       .Case("DWARF", XCOFF::STYP_DWARF)
+       .Case("TEXT", XCOFF::STYP_TEXT)
+       .Case("DATA", XCOFF::STYP_DATA)
+       .Case("BSS", XCOFF::STYP_BSS)
+       .Case("EXCEPT", XCOFF::STYP_EXCEPT)
+       .Case("INFO", XCOFF::STYP_INFO)
+       .Case("TDATA", XCOFF::STYP_TDATA)
+       .Case("TBSS", XCOFF::STYP_TBSS)
+       .Case("LOADER", XCOFF::STYP_LOADER)
+       .Case("DEBUG", XCOFF::STYP_DEBUG)
+       .Case("TYPCHK", XCOFF::STYP_TYPCHK)
+       .Case("OVRFLO", XCOFF::STYP_OVRFLO)
+       .Default(std::nullopt);
+ }
+
+ // Find the base VMA of the first section matching the given type for XCOFF.
+ // The syntax (SECTION_TYPE)(+offset) represents an offset from the section base,
+ // so we return the section's base address to compute: VMA = base + offset.
+ static Expected<uint64_t> getXCOFFSectionBaseAddress(
+     const object::XCOFFObjectFile *XCOFFObj,
+     XCOFF::SectionTypeFlags TypeFlag) {
+
+   for (const auto &Section : XCOFFObj->sections()) {
+     DataRefImpl SecRef = Section.getRawDataRefImpl();
+     int32_t Flags = XCOFFObj->getSectionFlags(SecRef);
+
+     if ((Flags & 0xFFFF) == TypeFlag) {
+       return Section.getAddress();
+     }
+   }
+
+   return createStringError(inconvertibleErrorCode(),
+                            "section type not found in XCOFF object");
+ }
+
+ static Expected<uint64_t> validateSectionType(StringRef ModulePath,
+                                                 StringRef SectionType,
+                                                 uint64_t &Offset,
+                                                 LLVMSymbolizer &Symbolizer) {
+   // Parse the section type string
+   auto SectionTypeFlag = parseXCOFFSectionType(SectionType);
+   if (!SectionTypeFlag) {
+     return createStringError(inconvertibleErrorCode(),
+                             "unknown section type: " + SectionType.str());
+   }
+
+   // Get the module info to access the object file
+   auto ModuleOrErr = Symbolizer.getOrCreateModuleInfo(ModulePath);
+   if (!ModuleOrErr) {
+     return ModuleOrErr.takeError();
+   }
+
+   auto BinaryOrErr = object::createBinary(ModulePath);
+   if (!BinaryOrErr) {
+     return BinaryOrErr.takeError();
+   }
+
+   object::Binary *Binary = BinaryOrErr->getBinary();
+   if (auto *XCOFFObj = dyn_cast<object::XCOFFObjectFile>(Binary)) {
+     // Get the base VMA of the section matching the type
+     auto SectionBaseOrErr = getXCOFFSectionBaseAddress(XCOFFObj, *SectionTypeFlag);
+     if (!SectionBaseOrErr)
+       return SectionBaseOrErr.takeError();
+
+     uint64_t SectionBase = *SectionBaseOrErr;
+     uint64_t SectionRelativeOffset = Offset;
+
+     // Convert section-relative offset to absolute VMA
+     // VMA = section_base + offset
+     Offset = SectionBase + SectionRelativeOffset;
+
+     // Return UndefSection - XCOFF symbolizer doesn't support SectionedAddress,
+     // so we use absolute VMA addressing instead.
+     return object::SectionedAddress::UndefSection;
+   }
+
+   return createStringError(inconvertibleErrorCode(),
+                           "section type syntax is only supported for XCOFF objects");
+ }
+
 static Error parseCommand(StringRef BinaryName, bool IsAddr2Line,
                           StringRef InputString, Command &Cmd,
                           std::string &ModuleName, object::BuildID &BuildID,
-                          StringRef &Symbol, uint64_t &Offset) {
+                          StringRef &Symbol, uint64_t &Offset,
+                          StringRef &SectionType) {
   ModuleName = BinaryName;
+  SectionType = StringRef();
   if (InputString.consume_front("CODE ")) {
     Cmd = Command::Code;
   } else if (InputString.consume_front("DATA ")) {
@@ -245,10 +335,43 @@ static Error parseCommand(StringRef BinaryName, bool IsAddr2Line,
         AddrSpec.consume_front_insensitive("+0x");
   }
 
+  // Check for section-relative address syntax: (SECTION_TYPE)(+0x0)
+   if (AddrSpec.starts_with("(")) {
+     size_t FirstClose = AddrSpec.find(')');
+     if (FirstClose != StringRef::npos && FirstClose + 1 < AddrSpec.size() &&
+         AddrSpec[FirstClose + 1] == '(') {
+       size_t SecondOpen = FirstClose + 1;
+       size_t SecondClose = AddrSpec.find(')', SecondOpen);
+       if (SecondClose != StringRef::npos) {
+         // Extract section type from first parentheses
+         SectionType = AddrSpec.substr(1, FirstClose - 1);
+
+         // Validate that section type is not empty
+         if (SectionType.empty())
+           return makeStringError("unknown section type: empty section type");
+
+         // Extract offset from second parentheses
+         StringRef OffsetPart = AddrSpec.substr(SecondOpen + 1, SecondClose - SecondOpen - 1);
+
+         // The offset should start with '+'
+         if (!OffsetPart.consume_front("+"))
+           return makeStringError("section-relative offset must start with '+'");
+
+         // Parse the offset - auto-detect base (0x prefix = hex, otherwise decimal)
+         if (OffsetPart.getAsInteger(0, Offset))
+           return makeStringError("invalid offset in section-relative address");
+
+         Symbol = StringRef();
+         return Error::success();
+       }
+     }
+   }
+
   // If address specification is a number, treat it as a module offset.
   if (!AddrSpec.getAsInteger(IsAddr2Line ? 16 : 0, Offset)) {
     // Module offset is an address.
     Symbol = StringRef();
+    SectionType = StringRef();
     return Error::success();
   }
 
@@ -260,6 +383,7 @@ static Error parseCommand(StringRef BinaryName, bool IsAddr2Line,
   // Otherwise it is a symbol name, potentially with an offset.
   Symbol = AddrSpec;
   Offset = 0;
+  SectionType = StringRef();
 
   // If the address specification contains '+', try treating it as
   // "symbol + offset".
@@ -282,10 +406,11 @@ template <typename T>
 void executeCommand(StringRef ModuleName, const T &ModuleSpec, Command Cmd,
                     StringRef Symbol, uint64_t Offset, uint64_t AdjustVMA,
                     bool ShouldInline, OutputStyle Style,
-                    LLVMSymbolizer &Symbolizer, DIPrinter &Printer) {
-  uint64_t AdjustedOffset = Offset - AdjustVMA;
-  object::SectionedAddress Address = {AdjustedOffset,
-                                      object::SectionedAddress::UndefSection};
+                    LLVMSymbolizer &Symbolizer, DIPrinter &Printer,
+                    uint64_t SectionIndex) {
+   uint64_t AdjustedOffset = Offset - AdjustVMA;
+   object::SectionedAddress Address = {AdjustedOffset, SectionIndex};
+
   Request SymRequest = {
       ModuleName, Symbol.empty() ? std::make_optional(Offset) : std::nullopt,
       Symbol};
@@ -342,6 +467,7 @@ static void symbolizeInput(const opt::InputArgList &Args,
   object::BuildID BuildID(IncomingBuildID.begin(), IncomingBuildID.end());
   uint64_t Offset = 0;
   StringRef Symbol;
+  StringRef SectionType;
 
   // An empty input string may be used to check if the process is alive and
   // responding to input. Do not emit a message on stderr in this case but
@@ -352,24 +478,41 @@ static void symbolizeInput(const opt::InputArgList &Args,
   }
   if (Error E = parseCommand(Args.getLastArgValue(OPT_obj_EQ), IsAddr2Line,
                              StringRef(InputString), Cmd, ModuleName, BuildID,
-                             Symbol, Offset)) {
+                             Symbol, Offset, SectionType)) {
     handleAllErrors(std::move(E), [&](const StringError &EI) {
       printError(EI, InputString);
       printUnknownLineInfo(ModuleName, Printer);
     });
     return;
   }
+
+  // Validate section index from section type if specified
+  uint64_t SectionIndex = object::SectionedAddress::UndefSection;
+  if (!SectionType.empty() && !ModuleName.empty()) {
+    auto SectionIndexOrErr = validateSectionType(ModuleName, SectionType, Offset, Symbolizer);
+    if (!SectionIndexOrErr) {
+      handleAllErrors(SectionIndexOrErr.takeError(), [&](const ErrorInfoBase &EI) {
+        printError(EI, InputString);
+      });
+      printUnknownLineInfo(ModuleName, Printer);
+      return;
+    }
+    SectionIndex = *SectionIndexOrErr;
+  }
+
   bool ShouldInline = Args.hasFlag(OPT_inlines, OPT_no_inlines, !IsAddr2Line);
   if (!BuildID.empty()) {
     assert(ModuleName.empty());
     if (!Args.hasArg(OPT_no_debuginfod))
       enableDebuginfod(Symbolizer, Args);
     std::string BuildIDStr = toHex(BuildID);
+    // Note: Section type resolution is not supported for BuildID-based lookup
     executeCommand(BuildIDStr, BuildID, Cmd, Symbol, Offset, AdjustVMA,
-                   ShouldInline, Style, Symbolizer, Printer);
+                   ShouldInline, Style, Symbolizer, Printer,
+                    object::SectionedAddress::UndefSection);
   } else {
     executeCommand(ModuleName, ModuleName, Cmd, Symbol, Offset, AdjustVMA,
-                   ShouldInline, Style, Symbolizer, Printer);
+ShouldInline, Style, Symbolizer, Printer, SectionIndex);
   }
 }
 

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@midhuncodes7
Copy link
Contributor Author

@jakeegan @hubert-reinterpretcast - I couldn't add any reviewers to the PR, hence tagging you here to review the PR

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 186358 tests passed
  • 4857 tests skipped

@midhuncodes7 midhuncodes7 force-pushed the midhun7/symbolizer-section-relative-syntax branch from 21083e1 to efd410e Compare November 19, 2025 07:28
@midhuncodes7 midhuncodes7 force-pushed the midhun7/symbolizer-section-relative-syntax branch 2 times, most recently from 8b105a1 to 5b83ce5 Compare November 19, 2025 08:31
@midhuncodes7 midhuncodes7 force-pushed the midhun7/symbolizer-section-relative-syntax branch from 5b83ce5 to d837ef6 Compare November 19, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants